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(54) Methods of refining descriptors 

(57) A database containing data items, such as 
images, text, audio records or video records, in both 
summary and complete fomns, is searched by reference 
to descriptors associated with the data items, in 
response to user requests. The search result contains 
the summary form of the selected items. Responses 



from users indicative of the utility of the search results 
(inrplicrt in the users* subsequent actions in requesting 
the complete forms of specific selected items) are used 
to refine the descriptors to alter sut>sequent search 
results. 
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Description 

Technical Held 

5 [0001 ] This invention relates to methods of refining descriptors, for exanple such as ar used for retrieving data items 
from databases. 

Background Art 

10 [0002] A major obstacle to the efficient retrieval of data Is the way they are indexed (i.e. to select descriptors or key- 
words). Currently there are two common ways of indexing: 

1 . The use of a automatic indexing tool to extract words from text documents or recognize forms arxJ elements in 
images, vkJeos. etc. This is based on artifk:ial intelligence (Al) techniques and has the limits that this technology 

75 offers. 

2. One or more people does the indexing manually after a ck>se analysis of the data. This is usually accurate but 
reliant on the vocabulary of the indexer and their perception of the data. (It may be very subjective for images, for 
instance). It is also time-consuming. 

20 [0003] Both of these technk^es provkie a set of indexing keywords or descriptors which are static, and which very 
often belong to a vocabulary that is inconsistent and limited. However, people querying the system in effect provkJe pos- 
sble keywords in their queries. The keywords in the queries may not be existing descriptors txit they are relevant to the 
data. Currently this information is left unused arKi forgotten by the system once the user quits the system. As a result, 
if the indexing keywords are inappropriate, nothing can be done to improve them even rf some people may provkje good 

25 indexing terms as they search. 

[0004] If the terminology commonly used changes over time (for exanple. one technical term becomes superseded 
by arwther), then it becomes necessary to redo all the indexing whk;h is undesirable, especially as databases become 
bigger and bigger. 

[0005] Thus there is a need to describe data so that they will be more easily searchable by the majority of the oom- 
30 munity. 

Disclosure of Invention 

[0006] According to one aspect of this inverrtion there is provided a method of refining descriptors associated with 
35 data items to enable retrieval thereof, comprising the steps of: 

storing said data items in both a summary form and in a complete form; 

receiving a search request from a user for selection of data items, saki request incorporating at least one desaip- 
ta; 

40 sending the user a search result comprising only the summary form of data items selected in accordance with said 
search request: and 

using a response of the user requesting the complete form of a selected data item in the search result to gukje 
nrxxjifications of the descriptors. 

45 Brief Description of Drawings 

[0007] Methods and apparatus in accordance with this invention for refining desaiptors associated with data items 
will now be described, by way of example, with reference to the accompanying drawings, in which: 

50 Figures 1 and 2 are used in descrit>ing different indexing systems; 



Figure 3 is a block schematic diagram of a system for implementing the invention; 

Rgure 4 illustrates adaptive indexing of an Image; 

Figure 5 shows various stages involved in managing inclusion of a item in or omission of that item from a set 

of descriptors; 

55 Figure 6 illustrates operation of a technique for adjusting keyword weights; and 

Rgure 7 shows an exponential furKti n used in deckiing weighting factors applied to descriptors. 
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[0008] Keywords or descriptors available in a datat>ase system to lead a search r to a particular Hern (such as a doc- 
ument or image) are often different from the descriptors which best describe th content of that item. This is what makes 

5 information retrieval sometimes inaccurat and unsuccessful. In a traditional retrieval system wrhich provides a static 
index by algorithmic means, the index can be represented by a data item-keyword, sparse matrix M of fixed dimension- 
ality (see Rgure 1). An entry in the matrix M{d, i) is a number and has the meaning "the data item d has been indexed 
with a keyword r. Binary Information or keyword frequencies can be stored and this leads to traditional binary or prob- 
abilistic retrieval systems. The main weakness of the static indexing approach Is a user-system vocabulary mismatch 

10 (need for thesauri, stemming and fuzzy matching) and a need for mapping a user query Into the Indices that the system 
uses. 

[0009] It would be desirable to capture the keywords provided by users while searching, and associate them with the 
Items the users retrieved. In this way it can be ensured that items are usefully associated with the keywords people 
actually use to retrieve them. 

75 [001 0] An example of the invention is referred to herein as adaptive indexing, where the items are indexed by refer- 
ence to contributions (explicit or implicit) from the entire community as people search the data. WHh adaptive indexirig 
the system is capable of capturing the keywords entered by the user community. The information captured from the user 
Interaction during the process of searching and browsing the results leads to automatic thesaurus buiki-up. gradual 
convergence of the system's keywords to the user population vocabulary and Indexes which are always up-to-date. The 

20 dynamic index could be visualised as a list of keywords (Figure 2) with the containing list enumerating all the data Items 
and the contained lists enumerating all the keywords for a given data item. Keywords could also have scores attached 
to them according to their degree of relevance for a data item. The lists expand when a new keyword is entered. 
[001 1] In this scenario, each item or piece of data has a set of associated dynamic descriptors or keywords, which 
are not static but vt^ich can change with time as a result of information from people searching. Each descriptor for a 

25 given piece of data has a weight that measures its relevance to that piece of data. The value of the weight Is statistically 
determined by the searches. At an instant in time, the descriptor with the highest weight at that time is considered to be 
the best description of the piece of data because it was the most popular description of the piece of data given by peo- 
ple using the system. The descriptor with the srrtallest weight is not very relevant to the piece of data, arxi If its weight 
continues to decrease then at some point the descriptor may be removed (gart>age collected). 

30 [001 2] Feedback from users may be expricit (e.g. users provide comments on how useful or relevant the search result 
was) or implicit (e.g. a system monitors whether people make purchases in relation to the content of search resuHs). In 
the exarrple described below witin reference to Figures 3 and 4 the feedt>ack Is Irrplidt. I.e. the user does not know 
about the learning process going on in the system. Consequentiy, tiie system has to evaluate the associations that are 
Implied by the user's actions. 

35 [001 3] Referring to Figure 3, a user 1 0 operates a computer terminal 1 2 to send search requests via a communica- 
tions link 14 (e.g. in a computer communications network) to an input/output interface 16. The interface 16 fonvards the 
search requests to a processor 18 which executes software program Instructions stored in a memory 20 to search a 
store 22 hokJing data items and associated descriptors for indexing tiiem. The program Instructions may include, for 
example, search engine functions for scanning for data items associated wrtti specified descriptors, and functions for 

40 managing the associations with descriptors. 

[0014] Each data item (document, image, etc.) is hekl in tiie store 22 in two different forms: 

a "summary form" where it is minimally descrik>ed so that the user can quickly access its contents. In the case of 
images, the summary may be a Ihumbnair (a reduced-scale, low resolution version of the full image) with possibly 
45 an image titie or some other information such as the photographer's or artist's name, or a image reference number. 
By quickly examining this summary, the user can form an Initial opinion on whether this Hem may be relevant to their 
query. 

a "complete form", which contains all the necessary information about the item to enak)le a decision to be made on 
whetiier it is relevant to tiie current query or not. For an image, for instance, this conplete form would comprise the 
50 image at a good enough resolution to enable details of the composition as well as the quality of the Image to be 
assessed. In a remote technical support system it could be the full history of a call for support by a user, and the 
assistance and advice given (Question / Answer / Comments / Pointers to relevant documents etc...). 

[0015] The user 10 first provides a query, and the processor 18 responds wrtii a list of sunvnaries of items which 
55 appear potentially relevant. The user browses through this result list, and upon finding an item which appears from its 
summary to be similar to what is being sought, she will access the corrplete form of tiiat item. Th processor 1 8, under 
the control of the program in the memory 20, treats this choice as an irrplicit signal that an associati n has been made 
between the initial query (list of descriptors) and that Item in the datat>ase. Consequentiy it updates the descriptors of 
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the selected item accordingly In the store 22. AH the keywords in the query are associated with the selected item in this 
updating, in-espective of whether or not they were already descriptors for that item - this is how the system applies new 
keywords. 

[001 6] Thus, referring to Rgure 4, a picture of a pig with a litter of piglets standing by a cluster of flowers may already 
5 be indexed according to the terms pig, farm, family, flowers and field. A user may nter a search query containing the 
terms pig, piglet, farm, countryside and family. Because countryside and piglet are not already present as descriptors 
for this picture, the system adds them. It also adjusts (increases) the weighting of the descriptors in the query which are 
already associated with this item (such as pig and farm). If flowers persists with zero weighting it is eventually removed 
as a desaiptor for tfiis image. 

10 [001 7] The evolution of the weights of the descriptors for a given data item is tailored by tfie interactions of the users. 
The more users associate a descriptor with a particular data item, the higher the resulting weight. If users' description 
of the data item changes (through, for ^cample, evolution of terminology, historical events, new terminology, a new data 
domain or a new set of users) the descriptors will evolve according to the majority opinion of the community of users 
searching the data. 

15 [0018] As this technique relies on purely implicit indications, the possibility of some inappropriate associations cannot 
be avoided. For instance the user 10 may be seeking a picture of a lion eating its prey. She n^y enter the query "lion 
eating prey", and the processor 18 returns a picture of an antelope resting in the shadow of a tree. Although the user is 
not interested in buying rights to use this image she finds it appealing and requests the complete form to see the image 
in more detail out of pure curiosity. The keyword capture process implemented by the processor 18 will reflect this 

20 action by reinforcing or re-indexing this image of an antelope with the k^words lion", "eating" and "prey", with "Ibn" 
perhaps becoming a new descriptor in the process. It is also possible to make an inappropriate association by associ- 
ating an item with a misspelled keyword. 

[0019] However such inappropriate associations shoukJ have a minimal impact on the system, because an indivklual 
association does not change the keyword weighting very much. A newly associated keyword does not acquire maximal 
25 significance immediately after the first association is made. In other words, more than a single association is needed to 
radically change the indexing; for example, in one inplementation, new keywords are not used for search until five asso- 
ciations with that keyword have been made. Consequently a misspelled keyword will become a valid descriptor for the 
data item only if Jt is a common misspelling. 

[0020] So far adaptive indexing has been described essentially as a process happening when access is made to the 
30 complete form of a data item for previewAeview. However it is also possible to introduce muttiple levels of impact on the 
indexing. For exanrple. it is possible to reinforce the descriptors further if the user 10 decides actually to buy rights to 
use an image. This is equivalent to adding an extra level of relevance feedback that woukJ be more explicit, and it 
reduces the potential risks of purely implicit feedt^ack. Explicit feedback could also be introduced at the end of the 
retrieval phase, especially for remote support systems where customer satisfaction is often PDonitored. 
35 [0021 ] Each association between a descriptor or keyword and a data item has a weight, which is a value between 0 
and 1 . This weight may be implemented in either of two ways: 

data item focussed: weights are associated with data items and normalisation is done relative to data items; this 
implies defining how a data item is described; 
40 - keyword focussed: weights are associated with keywords and normalisation is done relative to a keyword; this 
implies defining what a k^word means. 

In the present embodiment keyword focussed weighting is used. This is mainly to ensure that specialist keywords which 
are very rarely used (txit wrhich are extremely good descriptors) nonetheless strongly influence the result of a query. If 

45 the data item focussed approach were used, the weight of such rarely used keywords wouM be small compared to the 
weight of other, nfx>re comnrwn, keywords. Thus a query combining a common keyword and an unusual keyword woukJ 
yield a result with many items matching the comnrK>n keyword, possibly swamping the items matching the unusual 
tfx>ugh highly relevant descriptor. With the keyword focussed approach the weight of this unusual descriptor is high, as 
the number of data items descrit)ed by this specific keyword is small. 

50 [0022] This choice has the skJe effect of giving inaeased importance to popular data items (as far as the weights are 
concerned). From a commercial viewpoirrt this is advantageous: in the case of images, for instance, there are often 
images which are particularly popular at one time, according to current fashion. 

[0023] The keyword weight is used to evaluate the importance of a keyword for a particular data item, and also to rank 
the results of a user query. In the embodiment described herein for each keyword three different weigfit values are dis- 
55 tinguished. v^ich determine the status of that keyword: 

ivj^^: the initial weight wh n the keyword first enters the system; 

Wif^^: th threshoki for a keyword to become searchable (i.e. be taken into account in determining whether a data 
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Item should be included in the result of a user query); 

Wi^^: the threshold below which a keyword t)ecomes "gartDage collected" (i.e. no longer used in assemt>ling 
results for user queries). 

5 The specific calculation of each of these values depends on the adaptive indexing algorithm adopted. 

[0024] Further each keyword has a status, determined by these weight values, which reflects the influence of the 
users as implied by their reactions to query results: 

1 . Master keyword : the master keyword is provided by the content provider or by a professional indexer (this is the 
w original descriptor). This keyword cannot be removed by the system, without the explicit consent of a supervisor. 

This is because some terms are very specific to the data item, or are even key descriptors, but are not frequently 
used because the average user (general public) is not familiar with them. Nonetheless their Inclusion enables spe- 
cialists to access the data item quickly. 

2. User keyword : this keyword has been provided by a user (general public); it is searchable because a significant 
75 number of people have already associated this keyword with a particular data item. 

3. Candidate keywords : there are two different types of candidate keyword reflecting two different types of transition 
for a keyword. In either case they are not "active" (not searchable). They will be gart>age collected when their weight 
falls below a given value (Wf^^) or become user keywords if their weight rises above a given value {Wi^^. 

20 - candidate for User keyword : this is the initial status of a new keyword entering the system. This keyword is not 
yet searchable, because it could be misspelled or an inappropriate association as described atxTve. This status 
reduces the risk of the search process being slowed by the presence of large numt)ers of ^unk" keywords. 
However, this also makes the addition of a new keyword harder, as it has to be used in association with an 
existing keyword for a data item several times before it exceeds the threshokl to become a User keyword itself; 

25 - candidate for Gart)aQe Collection : this status is reserved for User keywords whose weight decreases to its orig- 
inal introduction weight (wj^^, i.e. they became User keywords but were very rarely used afterwards. This could 
happen, for example, either by evolution of the vocabulary or by entry of an inappropriate keyword at some 
point in time. 

30 [0025] Rgure 5 shows the possible transitions between these different possible statuses, and the corresponding val- 
ues for the weight W//^ which result in each transition. 

[0026] Various different techniques for varying the value of the weight W/^^ can be used. Two are desaibed below. 
The first is a straightfbnward interpretation of simple probabilistic rules. The second one is more empirical and aims at 
forcing the weights to evolve following an exponential curve. 
35 [0027] In the first technique the weight is fixed for a given period of time. At the end of each period, the weight is re- 
evaluated according to the extent of association which has occurred during that period. The duration of a period is the 
only art>rtrary parameter. It depends on the total number of data items and on the extent of use of the search system 
(nurTi)er of queries per day for instance). At the beginning of each period, for each keyword k, two counters are set to 
zero: 

40 

Ci[ represents the number of times the keyword has been associated with data items (irrespective of whether it was 

with different data items or many times with the same data item); 

Cicj represents the number of times the keyword has been associated with data item /. 

45 At the end of the period, the weight of the association between a keyword and a data item is defined by 

= {Ck^ICk) if Ck is difiFerent fiom 0 
^ 0 otherwise 

50 

In other words, the weight represents the probability that the data item / is indexed by the keyword k. Under these cir- 
cumstances, the starting weight WiP for a new keyword will be proportional to l/Cj^- The two other threshokJs w)^^^ 
arxl Wi^^ are arbitrary and will be the same for all tiie keywords. A disadvantage of this method is that the history asso- 
ciated with a weight is rather limited and very dependent on the activity of tiie search system, and more specifically on 
55 the extent of use of the k^word. 

[0028] The lists of keywords could be sorted according to normalised scores and divided into quantized intervals of 
fixed length in proportion to each keyword's probability of indexing a data item. Keywords wouM compete on the basis 
of weight to be promoted to the higher interval and would b% moved down to the lower interval by nfx>re appropriate key- 
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words. Conditions can be specified for crossing interval boundaries to prevent keywords oscillating between intervals 
(see Rgure 6). Keyword probabilities may be quantized to save storage (a byte gives 256 bins of length 0.004 which 
could be sufficient). 

[0029] In th second, exponential function, technique, the weight follows a curve composed of two exponential curves 
5 (see Rgur 7). Thus increasing and decreasing th weights is reduced to a simple multiplication by a specific coeffi- 
cient Each association of a data item with the same keyword will follow the same curve depending on the initial weight. 
[0030] For convenience and efficiency of computation it is preferable to store the value 

10 

instead of w itself. Hereinafter v is referred to as the "Relationship Coefficient". In addition the following notations are 
used: 

Relationship Coefficient of association [i,k] 
Initial Relationship Coeff ident for associations with keyword k 
User Relationship Coeffident for associations with keyword k 
GC Relationship Coeffident for assodations with keyword k 
NurTt>er of data items indexed by keyword k in active assodations 

20 

The following requirements can be easily ensured: 

weight limitation to represent a relevancy weight (i.e. Wjf^ e. ]0,1[ or g ]-1,0D 

we can control the number Nu^er of interactions needed for a keyword In an association to become an active key- 
25 word (i.e. the total number of assodations of that keyword with that data item needed to have v^ > V/^^^^O 

we can control the number Nqq of negative interactions needed for a keyword In an association to become an can- 
didate for gart>age collection (i.a the number of assodations needed to have v/j^ < vj^^ 

reversibility of updating (i.e. Vjj^ Increased once and decreased rif^ times returns to the Initial value before the 
Increase) 

30 

[0031] The weight-updating procedure requires an initial setting for the relationship coeffident. For each assodation 
between a keyword and a data item this relationship coefficient will be increased or decreased. In the case of a new 
association, it will be entered into the store 22, and if this new association is made with a new keyword, this keyword 
will also be entered into the store 22. 
35 [0032] The relationship coefficients for a keyword are first Initialized depending on the number of data items indexed 
by the keyword. It Is considered that the nriore data items one keyword indexes, the less relevarrt It shoukl Initially be to 
desaibe the data items. Thus vi/^ can take the fonm: 

For practical reasons it Is undesirable for the initial relationshq3 coefficient to be too high, so 2 is added to n/^ In the for- 
mula. This is art>rtrary and different values coukJ be used here. 
45 [0033] For each part of the curve (Part 1 and Part 2. Rgure 7) there Is a respective Inaeasing coeffident and a 
decreasing coeffident. These coefficients, as well as the initial relationship coeffident, are specific to each keyword. 

Part 2 of the cun^e: when v^v^ 

so [0034] If / is the function describing this part, with the two initial conditions f{0) = and the slope ^(0) = s . the 
following expressfon tor f is obtained: 

/(x)=v°exp 



Increasing coeffident equation (calculated from /(x+1)): 



15 
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«';,= ci^) • with C(^)= exp 

5 Decreasing coefficient equation (calculated from/(x-1/nj^)): 

^l-dllo* v.,,with=C?i, =expf— ^] 

10 



Part 1 of the cun/e: when voA 

If g is the function describing 
15 expression for g is obtained: 



If g is the function describing this part, with the two initial conditions ^(0) = Vf^^ and the slope ^'(0) = s . the following 



5M = (v2 + 1)expf-f:^|l 



20 

Increasing coefficient equation (calculated from gr(x+1)): 



25 



Decreasing coefficient equation (calculated from ^(x-l/n^)): 

30 

^]h- 02c • ( Vflt + With cUi^ =exp| 



35 

In Figure 7 the x axis represents the number N of associations between a keyword and a data Item. The exact deriva- 
tion of A/ is: 

40 

where N* represents the number of times the association li,k] was made and AT the number of times the associations 
[j\k] were made for y 5t /. 

[0035] Tlie User and GC relationship coefficients are derived as follows: 

45 User relationship coefficient v^^®®^ 

This represents the value of v taken after A/^g^ associations have occurred, without any decrease, since the asso- 
ciation was created (i.e. since v=v^). 



50 
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GC relationship coefficient 

This represents the value of v that would take the value of the Initial relationship coefficient v^, after Nqc negative 
interactions have occurred, without any decrease. 
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[0036] The coefffident Hf^ counts assodations wrth the keyword k that are active, which means that th oth r coeff i- 
cierrts VfP, C}^ . . C^>, and C ^ only relate to the active keywords. When the status of a keyword /c in an asso- 
ciation changes, the value of n/^ must be increased (candidate keyword to user keyword) or deaeased (user keyw rd 
t candidate for GO keyword). Then all th oth r coeffidents must be recalculated. 
5 [0037] If a new association is made wHh a new keyword k, the system shoukJ inrtiaiise the valu of n^i^ to 1 and then 
calculate v^ir''. CO) , c (2) . and C ^ . Then the association is created between the data item and the keyword, and 
Vjf^ gets the relationship coefficient value Vf^^. 

[0038] When a new association is made but with a keyword k already in the system (i.e. indexing other data items), 
this new assodation is aeated and Its relationship coeffident is initialised to Vf^^. As it is not an active keyword for tfiat 
10 data item yet. It Is not counted in rif^ so the other coeffidents are not yet recalculated. 

[0039] Some example scenarbs incorporating the present invention will now be briefly described: 

1 ) Indexing new data items: a batch of photographs has to be added to the Image repository and there exists a com- 
munity of picture Indexers (these couki t>e the users browsing the collection). Each indexer is given a randomly 

15 selected picture from the new batch and is asked to provide the keywords. These are added to the keyword set 
Indexing the picture or if the keyword Is already present the counter assodated with it Is incremented. Provided that 
users agree on a subset of keywords for a given picture, these wouM eventually emerge with the higher score. 

2) Searching an indexed collection of data items: the collection of photographs is being searched with keywords by 
a large user community. The candidate photographs selected in accordance with the keywords are shown in 

20 thumbnail form, and once a thumbnail image Is selected for viewing of the complete version at full size and resolu- 
tion the search keywords nxxiify existing indices by a small factor (learning rate). If the user subsequently chooses 
to purchase a copy of the Image, the association of the search keywords with the image may accordingly be 
strengthened further. One couM try to obtain the thesaurus automatk^ally assuming that two sut)sequentty entered 
keywords are related semantically (assuming keywords are nouns only). This is a very weak assumptbn and nrx>st 

25 of the pairs wouM constitute "noise** (I.e. they have a very small probability of being entered by another user) but 
consistently entered keyword pairs would emerge through the scoring procedure. 

[0040] Many applications for adaptive indexing exist. The World Wide Web provides one particularly attractive oppor- 
tunity, since its user community is huge and diverse. People use the Web to search for Information of any type and are 

30 sensitive to delays in the search so the quality of indexing is very important. The Web also changes very quickly as tech- 
nologies evolve: there is a need for a maximum of dynamism as well as availability of the information. 
[0041 ] Adaptive Indexing might also be very useful for smaller user communities. A corporate user community can for 
Instance train the search tool to use their own specialized vocabulary. Since the indexing is adaptive, the indexes can 
be specific or dedicated to a particular area. 

35 [0042] This system wouki be extremely useful for image libraries, since automatic tools to index inr^ges are very dif- 
ficult to produce. The way that we descrbe an image is also dependent on what we take into account in tiie image: it 
may be the elements which go into its composition, or the emotion that it provokes, for Instance. An adaptive indexing 
tool will build a set of desalptors which reflect what the major'rty of people searching the image library think about an 
image, making rt easily retrievat)le by this majority. 

40 [0043] On one hand therefore this technk^ue may t>e used to index tiie Web and make a data item easily reachable 
by a majority of people searching for It, and on the other hand it allows the use of a very restricted vocabulary for index- 
ing in a small user community witii rigid rules. The system adapts itself to tiie environment and can be nxsved snrxxsthly 
from one environment to another. 

[0044] In fact, this system tries to capture real life perception of objects in the environment. We ail have different ways 
45 of desaibing something but the description that is most often used in the appropriate community can be consKlered as 
the derTxx:ratic description. Thus an adaptive indexing system can act as a repository for human knowledge, and taking 
"snapshots" of the state of the system from time to time couki allow cultures to be compared over time. 
[0045] Although tiie system can adapt its indexes automatically without intervention, there is also the possibility that 
a manager of tiie system can set some parameters in accordance with the search capabilities needed: 

50 

• modification of the number of descriptors for ttie data. By nrxxiifying the ttireshoW of tiie minimum weight or the total 
number of descriptors allowed, one can deckle who wkje the vocabulary will be. 

- modrfication of the amplitude of the weight: the manager can choose whether It is appropriate to have weights 
which are very close together or far apart. This has to do witii ttie strategy for training the system when a new 
55 vocabulary has to be built up, such as in the initial phase, or just after some event such as change of user commu- 
nity which is likely t bring in a significant number of new descriptors and make some of the okl descriptors obso- 
lete. We couM start with w ights close together so that links can be made easily t>etween pieces of data and new 
descriptors, and later make the weights further apart as the vocabulary for the description stabilizes. 
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Claims 

1 . A method of refining desaiptors associated with data items to enable retrieval thereof, cx)mprising the steps of: 

5 storing said data items in both a summary form and in a conrptete form; 

receiving a search request from a user for selection of data items, said request incorporating at least one 
descriptor; 

sending the user a search result comprising only the summary form off data items selected in accordance with 
said search request: and 

10 using a response off the user requesting the complete form of a selected data item in the search result to guide 

nxxlff ications off the descriptors. 

2. The method of claim 1 , wherein the users' responses conprise explicit comments supplied by the users indicative 
of the utility of search results. 

75 

3. The method of claim 1 or claim 2, wherein the data items are any one or more of images, text, audio records or 
video records. 

4. The nrtethod off claim 3. wherein the data Items are images and the summary forms comprise thumbnail versions off 
20 the images. 

5. The method off claim 1 . wherein a further response of the user in selecting a further action in respect of the com- 
plete lonn of a data item after the complete form has been provided is also used to guide modrffications of the 
descriptors. 

25 

6. The method of claim 1 , wherein association between a data item and a descriptor has a weight indicating strength 
of that association. 

7. The method of claim 6. wherein modification of the descriptors in accordance with user response includes nxxjifi- 
30 cation of the weight off association. 

8. The method of claim 6 or daim 7, wherein an association between a data item and a descriptor is assigned a first 
weighrt upon an initial occurrence off that data item with that descriptor, the weight is increased upon subsequent 
occurrences off that data item with that descriptor, and the weight is decreased upon sut>sequent occurrences of 

35 that data item without that descriptor. 

9. The method of claim 8. wherein the descriptor becomes usable for retrieval of the data item when the weight 
reaches a first predetermined threshold, and the descriptor becomes no longer usable for retrieval of that data item 
if the weight falls to a second predetennined threshold. 

40 

10. The method of daim 8 or daim 9. wherein change in weight for successive said occurrences is deternruned in 
accordance with a exponential function. 
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